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Since there has been much discussion of Virtual Memory (VM) schemes 
in the context of Alto "Gold Coins", it seemed appropriate that the 
desires of the Mesa group be specified. This memo is a reasonably high 
level description of the characteristics v;hich would be helpful to Mesa 
in a VM scheme. 

Mesa Object Files 

A Mesa Object Module (file) is composed of two main parts, each of 
which is a contiguous group of pages in the file: 



header info 



object code 
literals 



symbol 
tables 



> n pages 



s pages 



Generally, s is larger than n, and sometimes s is twice as large as n. 

The action of "loading" a Mesa module requires only that the code 
pages be mapped into memory. The code is never altered in any way - all 
external connections (generally procedures and ports) are in the data 
associated with a program. A Mesa routine (procedure or coroutine) is 
uniquely identified by a frame containing its state and local variables, 
and many routines may share the code in an object module. The information 
which allows Mesa to associate a symbol table and code with a routine is 
kept in a frame called a creator frame . The act of "declaring" an object 
file to Mesa causes a creator frame for that file to be made. Instances 
of that module can then be created simply by transferring control to its 



creator; the result is a new routine. The cost of creating new instances 
is roughly comparable with the cost of a procedure call. 

The header information in the code block of an object file contains 
sufficient information to enable the Mesa loader to manufacture a creator 
for it. 

Abstractions for Managing (Overlaying) Code Blocks 

A contiguous set of file pages is called a page group in Mesa, and 
is identified by three values: 

filehandle: some handle on a file by which the pages of the file 

can be named (for example, a JFN in Tenex); 
pagebase: the page in the file which corresponds to page zero in 

the page group; 
size: the number of pages in the group. 

A page group is accessed by a P6-handle (actually a protected [sealed] 
pointer) and page groups may be created, destroyed, and swapped into 
memory or out. More than one page group may, in principle, be associated 
with the same pages in a file. On Tenex this is accomplished by PMAPping 
the appropiate file pages into the Tenex VM. One special kind of page 
group, called a Window group , is provided: the pagebase and size attri- 
butes of such a group are alterable after it is initially set up, and it 
can thus "window" pages of the file to which it is attached. 

The code and symbol table parts of a Mesa module are each modelled as 
a fixed page group by the Mesa loader, overlaying and debugging software. 
The main implication of this is that Mesa code is "swapped" (overlayed) in 
units corresponding to the code in a single module. Symbol tables, being 
separately swappable, only consume VM space when needed by debugging or 
dynamic (LISP-like) binding mechanisms. 

The following diagram portrays the relationships between the Tenex 
page-grouped VM as seen by Mesa, File memory, and Maxc disk storage (see 
next page, please): 
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One obvious conclusion from this diagram is that the logical grouping 
of file pages which Mesa considers useful is not at all maintained at the 
level of disk storage. A second observation is that the Tenex-provided VM 
is being treated as a real memory into which a much larger VM, composed 
of page groups, is mapped. For obvious efficiency reasons, on Tenex, only 
Mesa code and symbol tables are considered swappable because pointers to 
them can be controlled by software (for not-in-memory traps, etc.). 

Mesa/Alto 

We would like to propose a VM scheme for Altos which supports a struc- 
ture such as the following: 
fiH^o vm 
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The main things to notice about this second diagram are the following: 

(1) As well as logical page groups in VM, there are real page groups in 
files which correspond to real page groups on disk storage; 

(2) A physical page group represents an indivisible disk-to-real memory 
unit of transfer; 

(3) Logical page groups do not have to correspond to physical page groups, 
but there can be considerable performance payoffs when they do, or 
when they require a small number of physical page groups. This is 
because the definition of a physical page group includes the fact that 
reading it requires a minimal number of disk head seeks and disk 
rotations. 

(4) If all physical page groups are size p, the scheme is very similar to 
Tenex's VM, with page size = (p*256) words. 

Sundry Details 

Disk Bad Spots 

The disks on Altos do have bad spots. If a bad spot develops in a 
physical page group, one could either 

(a) change the single page group into (at most) three page groups, 
one of which is a newly allocated single disk page to take over for 
the page which is unusable; 

(b) move the page group as a whole to a new contiguous area and 
free the previous area, except for the bad page, which is put into a 
file of bad pages called BadSpots. 

I favor solution (b) since it uses machinery which will have to exist in 
any case, and because there are very few bad spots. Its advantage is that 
physical page groups are never fragmented; this might be very valuable if 
someone is treating the VM as consisting of fixed size pages as mentioned 
previously. 

Private Memory 

The memory private to a VM should just reside in a PrivateMemory file, 
but without all the fiction maintained by Tenex to pretend that the JOBPMF 
for a job is really a file when it is not. 

Sharing VM 

Virtual memory is only sharable by sharing files; i.e., by mapping a 
file name string into a logical page group in one's own VM. 
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On June 12 and 13, the group actively involved in the Implementation of 
Mesa (Goschke, Mitchell, Satterthwaito, Swoot) met to consider the 
problem of crcatinfj a version of Mesa for the Alto. Our conclusions 
about the major steps as well as time estimates for each are summarized 
below. An attachment summarizes the more important interdependencies of 
the stops and indicates a possible division of the work among the members 
of the group. 

V^o believe that certain additions and changes to TENEX Mesa are essential 
before Mesa can be moved to the Alto. Most of these are well understood; 
we propose to implement them in parallel with design of nn- interpretive 
system for both MAXC and the Alto. 

TENEX Mesa 

( 1 ) New version of the segmentation machinery (3 weeks) 

(a.) complete and test new SEGRUN and associated modules 

(b) modify debugger, loader, and bootstrapper 

(c) change the compiler to produce modules with new symbol 
table formats, expanded initialization code 

(d) ALSO: extend the compiler to allow arbitrary named types 

(2) Finish control structures (4 weeks) 

(a) implement support for the control primitives 

(b) change compiler's code generators 



(c) modify debugger, binder, loader, error handling 



(3) Gonoral cleanup (3 weeks) 

(a) control structures cleanup 

(b) Implement INCLUDEd program modules, revised binding 
mechanism 

(c) Introduction of simple constructors and of sots as data 
types 

(4) Documentation (Indefinite) 
Alto Hosa 

(5) Design the interpretive machine (8 weeks) 

(a) Instruction set 

(b) Interpretive engine. 

(c) - Alto microcode feasibl 1 Ity study 

(6) Implement Interpretive Mesa (1-Mesa) for TENEX (8 weeks) 

(a) make an i-Hcsa TENEX compiler 

(b) make a TENEX i-Mesa interpretive engine 

(c) allow 1-Mesa and c-Hesd modules to Interact 

(d) make a complete 1-Mesa system for TENEX 

Note that this will not be quite identical to an 1-fiesa 
system for Alto (e.g., 36 vs. 16 bit words, different 
operating system services, etc.) 

(7) Move 1-Mesa to the Alto (8 weeks) 

(a) write a simulator of the Alto operating system Interface 
for TENEX 

(b) alter low-level routines of 1-Hesa to match the Alto 

(c) • modify i-Mosa compiler to produce code for the Alto's 
interpretive engine 

(d) modify the TENEX interpretive engine to accept. Alto 1-Mesa 



(o) mnke a comploto i-Mcsa system for Alto (running under 
simulation by TENEX Hosn) 

(f) wrlto an interpretive engine in BCPL 

(g) transfer i-Mcsa system from TENEX to Alto 

(8) Hove i-Mcsa interpreter to Alto microcode (indefinite) 

Extornnl Constraints 

Early in the design of the interpretive machine (stop 5) wo need to 
understand the basic facilities to be provided by tho kernel 
operating system for the Alto. 

Prior to steps 7a and 7b, we will need a precise definition of that 
operating system's behavior and interfaces. 

Prior to step 7f, wo will need one or more dedicated Altos with 
reliable and reasonably complete (but not highly tuned) utilltlos and 
operating system. Easy access to an Alto for familiarizing ourselves 
with the utilities and service routines would be helpful toward the 
end of stop 5. 



Notes 



These time estimates, which are thought to be somev;here between 
realistic and optimistic, imply that a slow but usable version of 
Mesa could be running on Altos by the end of 1974 .(with some luck and 
few diversions) . 

They also imply that, with present manpower commitments, further 
investigation of the substantive issues in the design of Mesa data 
structuring facilities as well as the Implementation of any solutions 
will be pushed well into 1975. 

During the remainder of 1974, wo would like to encourage others to 
begin jsing TENEX Mosa to the extent that this is possible without a 
major effort to produce additional documentation. 
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Inter-Office Memorandum 

To: Mesa and Alto Groups Date: August 21, 1974 

From: Ben Wegbrelt, Chuck Gcschke Loc: Palo Alto 

Subject: The Implementation of Mesa on Alto Org,: PARC/CSL 



During the past six weeks, a number of measurements have been 
performed using Tenex Mesa with an eye toward designing and predicting 
the performance of a Mesa virtual machine emulated by Alto microcode. 
The purpose of this memo Is to outline the method used In this analysis and 
explain some of the results. The measurements can be divided into two 
classes: dynamic and static. The static measurements were collected as a 
basis for deciding how to design a compact representation of Alto-IYIesa 
object code. The results of this static analysis are presented in section VI 
at the end of this memo. The dynamic statistics were gathered to (1) 
analyze the compatability of the Mesa virtual machine and the virtual 
memory scheme proposed for Alto by Wcgbreit et. al. and (2) to give a 
feeling for the performance degradation/Improvement in moving from Maxc 
to an Alto. We begin by giving a brief overview of the essential 
components of the fVlesa virtual machine. 



/. The IVIesa Virtual Machine 

The IVIesa machine consists of a number of system controlled 
registers used to point at a program module's global (own) data and code 
and to point at the frame of a currently ncUve procedure. In addition the 
user Is allotted some fixed number of registers for pointing at his privately 
managed data regions. These registers will be implemented via base 
registers of the virtual memory hardware. A IVIesa program module 
consists of a collection of subroutines and (potentially) a main body. To 
facilitate subsequent discussion, let's define the following terms: 

Greg: pointer to globals of entire program 

Creg: pointer to code for main body and all procedures defined 
In a module 

Dreg: pointer to the "own" data of a program module 

Freg: pointer to the "local" data of the currently active 
procedure, I.e. the frame pointer 

Uregs: pointers (user computed) to data. 
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Given the notion of pagegroups in the proposed Alto virtual memory scheme, 
It is desirable to mininiire the number of times that various "base" registers 
must be reloaded in moving between pagegroups. More precisely, a base 
register will be said to "fault" when its. bounds registers must be reloaded. 
Let's assume that the above-mentioned Mesa registers are Implemented as 
base registers on the Alto. Further, let's assume that each IVIesa module Is 
treated as a pagegroup. Then it follows that as long as program control 
remains within a given module, Crog and Dreg will not fault. Uregs, of 
course, can fault at any time. What about Freg? Well, if one notes (as Is 
true In the case of Tenex Mesa programs) that the total active frame 
space never grows very large, then a pagegroup can be allocated for frame 
usage so that Freg seldom, if ever, faults. Of course, when a procedure 
call occurs, Freg must be changed so that It bases the new frame. This 
requires reloading Freg; however, Its bounds registers are unchanged and 
no fault occurs. Indeed, with the observation that module own variables 
also occupy a limited amount of storage in most cases, one can reasonably 
allocate them from the same pagegroup as frames. Thus Dreg also seldom, 
If ever, faults. Greg bases data common to the entire program; this 
includes the transfer vector and other similar data. Since this is 
reasonably small. It should be Implemented as a single page group. Hence, 
Greg never faults. 



//. Method of Dynamic Statistics Gatfiering 

Fortunately Tenex Mesa Is a reasonable vehicle for measuring the 
behavior of such a proposed model. It already exhibits an architecture 
consisting of Creg, Freg, and Dreg and allov/s acceptable analysis of Uregs, 
A PDP-10 emulator, created at Harvard and modified at BBN to measure 
Lisp, has been modified to be a suitable instrument for measuring Mesa. 
Every Instruction fetch and non-instruction-f etch memory access Is trapped 
in the emulator's Interpretive loop and a call made to the 
statistics-gathering routlne(s). All the various dynamic measures (while 
presented separately) were gathered In parallel. 



One of the first observations noted was that roughly 40% of the 
Instructions being executed on Maxc were in subroutine call and return 
sequences. Clearly thai points to the necessity of micro-emulated call and 
return Instructions for Alto-Mesa. Those sequences also introduced a great 
deal of noise Into the data collected. Hence the emulator was 
re-structured to account for call and return sequences In a special manner 
which will be discussed In more detail later on. 

The measuring of Uregs Is, confused somewhat by the fact that the 
present Tonex-Mesa compiler allocates user ac's in a stack-like fashion. 
Almost any other allocation discipline of these six ac's would produce 
better utilization for the purposes of this model. As a result the Ureg 
utilization is measured In two ways. The first assumes two ac's, one for 
reading and the other for writing (as in the Sturgis BCPL study). The 
second simply takes the six PDP-10 ac's as allocated In the Tenex object 
code for the applicable read or write operation. The experiments show 
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nearly no difference In the behavior of the two modes. A more reasonable 
allocation scheme like LRU could bo presumed to perform better but there 
does not appear to be an easy method to measure Its effect. 



///. The Mesa machine and address translation 

In order to explain the results of the study of 
IVlesa/Alto-vlrtual-memory compatability, let's examine the test data 
gathered for two very different fvlesa programs. Example A Is the IVIPL 
compiler compiling a large (36k character) source file and example B Is a 
formatting program (Ed Satterthwaite) handling a (107k character) text 
file. 



(A) 



Compiling NEWIVIPLEXP.NLS 



Instructions emulated: MOCGO^O 

Call inst: 6053898 Non-cnll inst: 8G12942 

Non-call memory rcfs: G770596 Iritrn-call memory rofs: A^^1902 



Module faults: 228633 Locnl calls: 155130 
Params: 499304 Rot vals: 290715 



Total calls: 383763 



User: 


R 


W 


Total 


RFLT 


WFLT 


Faults 




1282887 


450158 


1733045 


529626 


157911 


687537 


AC#: 


R 


W 


Total 


RFLT 


WFLT 


Faults 





27300 


260859 


294219 


12594 


12187 


24781 


1 


778790 


89722 


868512 


362662 


73421 


436083 


2 


279231 


166488 


445719- 


91163 


54905 


l'J6068 


3 


205313 





205313 


J4233 





74233 


• 4 


36 





36 


1 





1 


5 




















6 





















Total mem refs: 20592966 

Faults on AC'S thru 6 momrefs: 1138432 5.53 
Faults on 2-cachc mcmrofs: 1144803 5.60 

(B) Formatting of ch7 



Instructions emulated: 11398563 

Call inst: 3286631 Non-call inst: 8111932 

Non-call memory refs: 5803251 Intra-call memory rofs: 2415508 



Module faults: 141258 Local calls: 54737 
Params: 187543 Ret vals: 94557 . 



Total calls: 195995 



User: R W Total RFLT WFLT Faults 

398745 274688 673433 99843 112275 212118 
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AC#: 


R 


w 


Total 


RFLT 


wri-T 


Fnults 





122051 


170230 


301009 


110060 


111610 


225610 


1 


232525 


1075 


230100 


10132 


1561 


19996 


2 


25G07 


95959 


121566 


2121 


795 


2919 


3 


17699 





17699 


711 





711 


4 


162 





162 


1 





1 


5 




















6 





















Total mem rcfs: 16722601 

Faults on AG's thru 6 momrcfs: 531709 3.18 
Faults on 2-cachc momrefs: 191631 2.96 



Notes: 

(1) The runs chosen were very long because shorter runs (260k 

Instructions) were dominated by Initialization and user Interaction with the 
IVlesa debugger. The compiler was run on several long sources and 
demonstrated very uniform behavior. 



(2) Notice that instructions In call/return overhead change from 407» to 
30% between the compiler and formatter. The compiler uses a Tree IVIeta 
parser which translates into long sequences of subroutine calls. The 
emulator distinguishes those memory references which occur during 
call/return from those which do not. The intra-call/return memory 
references ore accounted for separately as will be described later. 

(3) Procedure calls are divided Into two classes: local calls and module 
faults (i.e. external calls). Parameters and returned values were counted 
but not used in the analysis. 

(4) The non-call/roturn memory references which are user computed (as 
opposed to frame references, references to literals, and module own 
references) were analyzed in the two ways mentioned earlier. The first 
("User:") assumes a two-pointer cache, one for reading and one for writing. 
The second ("AC//:") uses the PDP-10 AG's as allocated by MPL. A fault 
occurs when a pointer reference occurs to memory and the value which 
appears In the pointer falls in a different pagcgroup from the preceding 
reference using that same pointer. 

(5) "Total mem refs" are computed by summing the following quantities: 

(a)non-call/return Instructions 
(b)non-call/rcturn memory refs 
(c)10*Iocal calls 
(d)16*external calls 
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In (c) the 10 comes from: 



call 






allocate new 


frame 


3 (best case) 


save PC 




1 


save Freg 




1 


return 






restore Freg 




1 


restore PC 




1 


de-allocate f 


rame 


3 



In (d) the 16 comes from the 10 in local call/return plus: 
call 

save Creg.Dreg 2 

load new Creg.Dreg 2 

return 

restore Creg.Dreg 2, 

The "best case" assumption In frame allocation of (c) Is probably the most 
common case. 

(6) Faults (in both canes) are computed by multiplying the module faults 
by 2 (one load of Creg for call and one for return) and adding that to the 
faults incurred by the relevant Urog discipline. The fault rate Is computed 
by dividing the faults by the total memory references, 

(7) The above experiments were first run with the assumption that Mesa 
data modules (where the Urcgs sometimes roam) were one page In size. The 
emulator was then modified to treat each user data module as a 
multi-page-pagogroup. The effect, however, was negligible. 



IV, Cost of Base Register Faults 

When a base register faults, i.e. the address computed through It 
does not fall in its page group, the register is reloaded by consulting a hash 
table as proposed by L. P. Deutsch. The cost of this p^r fault may be 
estimated as follows: t 

(1) Using Peter's current micro-code ("Second Try at Lisp Microcode", 

6/20/74): 

The first probe of the hash table requires 13 microinstructions; subsequent 
probes require 15 microinstructions. Once the right entry is found, the 
steps for reloading the base registers depend on new instructions and 
hardware, so this is somewhat less certain - 4 additional microifistructions 
is plausible. If the hash table Is 1/2 full, then using double hashing and 
assuming random hash functions, we can expect an average of 1.3 probes 
for a successful search (since an unsuccessful search Implies a disk seek, 
vy^e won't consider that here). Hence, the mean number of microinstructions 
per fault Is: 13+(.3*15)+4 = 21.5. 
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Notet If wo use lincnr rcprobing instead of double hashing, then probeS 
subsequent to the first require only about microinstructions, while the 
mean number of probes is 1.5. This gives: 13 + (.G*0)+4 =21. That Is, It really 
doesn't make much difference - the first probe and terminal computation 
are domincnt. Hence, to simplify subsequent discussion, we'll assume linear 
reprobing and a mean number of probes of 1.5. 

(2) Using special hardware: 

(a) hardware hash computed in the memory Interface board and supplied on 
the bus reduces each probe by 5 microinstructions (from 7 
microinstructions needed to obtain a hash to 2), 



(b) Putting the hash table In RAM - reduces each read of the table by 2 
micro-Instructions (from 4 of which one is certainly overlapped to 1). 

(c) Putting everything on the memory Interface. The best possible probe 
sequence would seem to be (I) compute initial probe (li) start fetch from 
RAM (iii) get word from RAM (iv) compare against the virtual page number 
sought. A success would be followed by 2 cycles to reload the bounds and 
mapping registers. 

For those four possible organizations, the number of micro-Instructions 
needed to handle a fault are: 

(a) 8+(.5*10)+4 = 15 

(b) 11 + (.5'^8)+4 =19 
(a&b) 6+(.5*6)+4 = 13 

(c) 4 + (.5*6) + 2 = 8.6 



From the fau|t rate and number of micro-instructions per fault, the 
performance degradation caused by faults can be computed. Simple 
Instructions on the Alto in Nova oniulation mode require about 1200 ns per 
memory reference (i.e. computation is about 1/3 non-overlapped with 
memory.) This is probably low over the entire mix, but in the absence of 
reliable data, we'll take this as typical. The time spent in processing faults 
per unit of computation time is: faults/memory ref ** microinstruction/fault 
'^ 170 ns/microinstr * memory rcf/1200 ns. 

Thus the percentage degradation for each of the hardware 
organizations and each of the above fault rates arc as follows? 



The Implementation of IVIcsa on Alto August 21, 1974 Pago 7 



FAULT RATE 
Organ iznt ion 5.6 3.0 



all microcode 

(a) hardware hash 

(b) RAM tabic 
(a&b) 

(c) all hardware 



17.0'/, 


9.1% 


11.0'/. 


7.3% 


15.0% 


0.0% 


10.3% 


5.6% 


6 . 7% 


3.6% 



Considering the estimates used in various steps, these numbers are best 
treated as accurate only to within a factor of 1.6 or so either way. 



V. Page Faults 

Statistics on the expected number of page faults were gathered 
using a model similar to H. Sturgis' ("Some Statistics for Virtual Memory 
Fans, Part 2", 7/3/74). In brief, this models a LRU page replacement 
algorithm for all possible core buffer sizes as follows: A queue of page 
numbers Is maintained, with the l-th most recently rofcroncd page In the 
i-th position. A vector C of integer counts in maintained in parallel. On 
each memory reference, the queue is searched for the referenced page. 
Suppose it is found to be the j-lh page in the queue; then the j~th position 
of C is incremented by 1 and the page Is moved to the front of the queue. 
If an LRU page replacement algorithm is used with a paging buffer of k 
pages, then the number of page faults Is the sum C[K+1] + C[K+2] + 
C[K+3] + ... 

Note: This model takes into account none of the following: choice of dirty 
vs. clean pages for replacement, page groups, types of references to 
memory. . 

Page fault statistics for the two runs discussed In Section ill are as 
follows; 
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(A) Compiling NEWIVIPLEXP.NLS 

Memory References Considered: 15,383,534 



# pages in buffer 

10 

20 

30 

AO 

50 

60 

70 

80 

90 
100 
110 
120 
130 
140 
150 
IGO 
170 
180 
190 
200 
210 
220 
230 



# Ptigc fnuHs 

448,101 

259,541 

101,011 

06,410 

56,300 

30,396 

22,083 

1 5 , 04 1 

10,713 

7,022 

5,550 

3,057 

2,145 

1,100 

505 

310 

132 

04 

07 

55 

30 

23 





ftiuH 


. rate 


2 


91 


101 


f-2 


1 


08 


101 


-2 


1 


05 


101 


r-2 


5 


02 


101 


f-3 


3 


00 


101 


t'3 


2 


.37 


101 


r-3 


1 


,49 


101 


1-3 


1 


02 


101 


r-3 





.90 


101 


r-4 


4 


.95 


101 


r-4 


3 


.61 


101 


r-4 


2 


.30 


101 


r-4 


1 


.39 


J01 


f-4 


7 


.54- 


101 


r-5 


3 


.67 


101 


r-5 


2 


.02 


101 


r-5 


8 


.58 


101 


r-6 


5 


.40 


101 


r-6 


4 


.36 


101 


r-G 


3 


.58 


10 


r-G 


2 


.34 


101 


r-6 


1 


.50 


101 


r-6 












(B) Formatting of ch7 

IVlemory References Considered: 13,915,180 



# pages in buffer 

10 

20 

30 

40 

50 

60 

70 

80 

90 
100 
110 



# piifje faults 

15,404 

6,037 

3,042 

1,335 

925 

392 

170 

30 

14 





fault rate 

1.11 lOt-3 
4.34 lOt-4 
2.19 lOt-4 
9.56 lOt-5 
6.69 lOt-5 
4.09 lOt-5 
2.82 lOT-5 
1.22 lOt-5 
2.16 lOr-6 
1.01 lOt-6 




Note: IVlemory References Consi(jered in each case were the non~call 
memory references, i.e. the sum of the classes "Won-call Instr" and 
"Non-call memory refs". This was done to maintain consistency with the 
data In Section III. However, It produces results with unduly high fault 
rates since call sequences will require memory references for moving data 
which are well-behaved v/ith regard to paging. If, In fact, no faults were 



The Implementation of Mesa on Alto August 21, 1974 Page 9 



caused by these memory references then the fault rates would be lowered 
by about 1/3 for case (A) and 1/5 In case (B). 

To a first approximation, each Maxc page of 512 36-blt words 
corresponds roughly to one Alto page of 512 IG-bIt words, since Integers fix 
on one word and it is antlclr>ated that each inslriiction will fit In one word. 
(This neglects characters, real numbers, largo pointers, etc.) With 60 k of 
main memory used for buffers, this gives 120 pages. 

The effect of the fault rate on overall performance can be done In 
two v/ays: 

(1) Elapsed time 

If a portion of the disk Is used for paging (like the Maxc "drum"), then the 
access time on the Model 44 disk is about 30 ms and about 60 on the 
IVIodel 31. Consider case (A) as an example. Computation time Is about; 
(15.10t6)*(1.2.10i-6) = 18 seconds. Given 120 pages for buffers, the 
nurriber of faults is 3,657 and the time spent in paging with a Model 44 Is 
(3,657)*(30.10t-3) = 110 seconds. 

(2) Comparison with Tenex 

A second way of assessing the effects of page faults In the proposed 
IVIesa/Alto machine is in comparison v/ilh paging in the current Mesa 
implementation on Tenex. As the average time to access a "drum" page on 
Tenex is 42 ms, the effective device speeds may be regarded as 
essentially the same. Cases (A) and (B) v/ore each rerun twice under 
different load averages and the actual number of page faults (PGSTAT + 1) 
.were obtained. 

load < 2 load > 6 



(A) compiler 432 2987 

(B) formatter 487 1006 



What conclusions can one draw? A medium-size well-behaved 
program such as the formatter, example (B), may be expected to fit In core 
and never page fault. Time v/aiting for the disk Is only that required to 
load pages. For the example run of (B), we have 

Alto upper bound of time to load pages 3.3 seconds 

Alto compute time (estimate) 16.7 seconds 

Tenex time for page faults 20.4 seconds 

Tenex compute time (estimate) 16.7 seconds 

That Is, Alto Mesa might be expected to run about twice as fast for this run 
- due to its lower paging needs. 
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The compiler,* example (A), Is a Inrge program for which there has 
been, to date, no major attention to obtaining good paging properties. As It 
stands, Its page fault rate Implies a performance of Tenex with a load 
average somewhat larger than 6. That is, with 120 pages, the number of 
faults is 3,657. Observe that with 150 pages, the fault rate drops to 565. 
Considering the ratio of compute time to paging time, there Is a critical 
knee In the curve at around 150 pages. It is anticipated that with a 
moderate amount of effort, the compiler could be reconfigured to shift the 
curve and bring the knee down to 120. At a fault rate of 565, Alto IVIesa 
would be comparable to Tenex Mesa with a load average of less than 2, 



VL Static measurements of Mesa Programs ' 

A number of static measurements have been collected on IVl6sa 
programs and several more remain to be gathered. The measurements 
documented hero v/ere obtained (for efficiency reasons) by examining the 
PDP-10 object code for Mesa programs. MorG static measures are being 
collected by Dick Sweet by metering the compiler but those results are not 
yet available. The programs measured consisted of all the lYlesa programs 
stored on the <fVIPS> directory. 

(a) Frame References 

The IVIesa frame Is the locus for parameters, local variables, and 
(rarely) temporaries (which in this analysis are grouped with locals). The 
purpose of this study is to determine how many bits of offset from Freg are 
needed to address variables in the frame. 

Bits 7»-of-frame-referenccs 



1 


38.6% 


2 


60.1% 


3 


. 81.9% 


4 


95.4% 



Of. course, the most frequently referenced frame variable was not 
necessarily the one witli smallest offset in the PDP-10 object code. Frame 
variables are simply allocated in declaration order. So, what happens If 
those most frequently accessed are allocated to the smallest offset 
position? The following table shows the results for sorted frame variables. 

Bits %-of-f rame-references 



1 


50.3% 


2 


72.6% 


3 


89.0% 


4 


97.4% 



The result of the sorting experiment shows that sorting Is only worthwhile 
In the (unlikely) event that one allocates only one or two bits to frame 
offset addressing. 
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(b) Constant usage 

The use of constants vvns measured in Hie object code. This 
approach had good and bad aspects. On tlie l)ad side, one has to be careful 
to account for constants which arc implicit in certain PDP-10 opcodes (e.g. 
AOS, JUIVIPG, HRROI, etc.). On llie good side, once object code has been 
produced, the compiler has already done a fair amount of compile-tlme 
evaluation of constant expressions. The measurement was done as follows. 
Constants In the interval [-16,14] were counted individually. Since many of 
the measured programs did character handling, constants In the range 
[15,127] were lumped Into a separate bin (called GUARS). Finally all 
constants less than -15 or greater than 127 were thrown into NEG and POS 
bins, respectively. Here are the results: 

Constant #-of-occurrenccs 

NEG 105 

[-15,-2] 72 

-1 264 

4259 

1 1705 

2 422 

3 227 

4 143 

5 122 
[0,14] 627 
CHAR 1259 
POS 1532 

Total 10737 

Note that the range [0,1] accounts for G5.57» of the occurences, [--1,2] 
accounts for 61.97c, and that 81.0% fall in the range [0,127], If the domain 
of constants Is limited to the range [-16,14], then the interval [0,1] 
accounts for 75.47« of the occurrences, the interval [-1,2] accounts for 
84.8%, and the Interval [-1,6] accounts for 95.37.. 

(c) Procedures 

A couple of static rr-eirs«'t"'*.r^;.w^ made for procedures that can be 
compared with dyncimic measurements made earlier. Procedure calls were 
partitioned into local and external calls with the result: 

Local calls: 1868 24.6% 

External calls: 5746 75.6% 

This compares with the dynamic local/external percentages of 40.4%/59.6% 
and 27.97o/72.17» In examples A aful B In section III. 

The number of parameters for each procedure call were also 
tabulated. 



0f-C(lllvS 


pcrcontngc 


1009 


?A.?.% 


^100 


An.n 


20P0 


21.87. 


563 


6.0'/. 


105 


1.8% 


70 


. 7% 


11 


0% 
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#-of-pcirnmctors 


1 
2 
3 
A 
5 
6 

In dynamic example A, procedures had an average of 1,3 parameters and In 
example B an average of .96 parameters. 

A number of other rather PDP-10 specific static mesaurements were 
also made which are of little or no Interest for Alto-fVlcssi. Ttie purpose of 
gathering the static data is to provide itiformntion helpful In designing a 
compact representation for Alto-Mesa object code. For example, the high 
frequency of one-parameter procedure calls suggests that a two operand, 
slngle-parameter-f unction-call Instruction might be profitable. Dick Sweet 
Is gathering dcitci from the compiler on more complex expressions than 
those which can be conveniently deduced from object code. In particular, 
If "f[a]" Is so frequent, it may well be that Instances of "b«-f[a]" are so 
common that a three operand instruction is warranted. 



